Fix excessive logging in create_pr.py that creates 43MB+ log files by mohammedahmed18 · Pull Request #2000 · codeflash-ai/codeflash

mohammedahmed18 · 2026-04-05T03:20:21Z

Problem

Line 38 of create_pr.py logged all keys from the function_to_tests dictionary using list(function_to_tests.keys()). For large codebases like budibase (1012 functions), this creates massive log files (43MB+) with a single DEBUG statement printing thousands of function names.

Evidence

Trace ID: 3d2ad2f0-254a-4401-9c93-84f691acabf0 (43MB log, 534K lines)
Location: Line 533922 shows list of 1000+ function keys in single log entry
Impact: Affects 4/22 logs (18%) in recent optimization run
Size: Each occurrence adds ~100KB to log file

Root Cause

Debug logging statement at line 38 was designed for small projects but became problematic when used with monorepos containing hundreds of packages:

# Before (buggy):
logger.debug(f"[PR-DEBUG] function_to_tests keys: {list(function_to_tests.keys())}")

Fix

Changed to log only the count:

# After (fixed):
logger.debug(f"[PR-DEBUG] function_to_tests has {len(function_to_tests)} keys")

This reduces log output from ~100KB to ~50 bytes per call.

Testing

✅ Added 2 regression tests in test_create_pr_logging_bug.py
✅ Tests verify count is logged, not full key list
✅ Tests verify log output stays under 10KB (vs 100KB+ before)
✅ All existing tests pass
✅ Linting passes (uv run prek)

Impact

Severity: MEDIUM (doesn't break functionality, but bloats logs)
Type: Systematic (reproducible on every optimization run with large codebases)
Benefits: Significantly reduces log file sizes for monorepo projects

Related Issues

Part of comprehensive bug-fixing effort for Codeflash optimization stability.

**Problem:** Line 38 of create_pr.py logged all keys from function_to_tests dictionary using `list(function_to_tests.keys())`. For large codebases like budibase (1012 functions), this creates massive log files (43MB+) with a single DEBUG statement printing thousands of function names. **Root Cause:** Debug logging statement was designed for small projects but became problematic when used with monorepos containing hundreds of packages. **Evidence:** - Trace ID: 3d2ad2f0-254a-4401-9c93-84f691acabf0 (43MB log, 534K lines) - Line 533922 shows list of 1000+ function keys in single log entry - Affects 4/22 logs (18%) in recent optimization run - Each occurrence adds ~100KB to log file **Fix:** Changed line 38 from: logger.debug(f"[PR-DEBUG] function_to_tests keys: {list(function_to_tests.keys())}") to: logger.debug(f"[PR-DEBUG] function_to_tests has {len(function_to_tests)} keys") This logs only the count instead of the full list, reducing log output from ~100KB to ~50 bytes per call. **Impact:** - Severity: MEDIUM (doesn't break functionality, but bloats logs) - Systematic: Reproducible on every optimization run with large codebases - Benefits: Significantly reduces log file sizes for monorepo projects **Testing:** - Added 2 regression tests in test_create_pr_logging_bug.py - Tests verify count is logged, not full key list - Tests verify log output stays under 10KB (vs 100KB+ before) - All existing tests pass - Linting passes (uv run prek) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

claude · 2026-04-05T03:20:54Z

Claude finished @mohammedahmed18's task in 2m 18s —— View job

PR Review Summary

Triage PR size — SMALL (1 line of production code changed)
Run lint/typecheck — prek passed; mypy has 4 errors in the new test file
Resolve stale threads — none found
Review code
Duplicate detection
Post summary

Prek Checks

uv run prek run --from-ref origin/main — passed (no issues).

mypy reports 4 errors in the new test file (test_create_pr_logging_bug.py) — missing return type annotations and a missing type annotation for a dict variable. These are in test code and won't affect runtime behavior.

Code Review

The fix itself is correct. Changing list(function_to_tests.keys()) → len(function_to_tests) resolves the 43MB log issue.

However, there are two broader concerns:

1. The `[PR-DEBUG]` statements are temporary debug code — they should all be removed

codeflash/result/create_pr.py contains 18+ [PR-DEBUG]-prefixed logger.debug() calls. The [PR-DEBUG] prefix is a hallmark of ad-hoc debugging code that was never cleaned up. Fixing only line 38 leaves the rest in production:

Line 47 (loop): logger.debug(f"[PR-DEBUG] test_file: {tf.tests_in_file.test_file}, test_type={tf.tests_in_file.test_type}") — iterates over all test files for every function; still O(n) output
Lines 69–77 (loop): Two debug lines per registry entry, called inside existing_tests_source_for which can be called per-function
Lines 134–141: Per-invocation-id debug lines inside the main processing loop

The right fix is to remove all [PR-DEBUG] statements, not patch the worst one. They're noise in the debug log and contribute to bloat at scale.

Fix this →

2. Test file is overbuilt for a 1-line change

test_create_pr_logging_bug.py is 147 lines for a fix that just changes one log format string. The two tests are near-identical (large dict vs small dict) and both verify the same thing. The size assertion (< 10000 bytes) is also brittle — it ties test correctness to output byte counts that will vary with Python version and log handler configuration.

A simpler, more direct test would just assert the log message format directly using caplog.

Duplicate Detection

No duplicates detected. The changed function existing_tests_source_for has no equivalent in other modules.

Test Coverage

Tests pass. Coverage for the changed production line is confirmed by the added tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix excessive logging in create_pr.py that creates 43MB+ log files#2000

Fix excessive logging in create_pr.py that creates 43MB+ log files#2000
mohammedahmed18 wants to merge 1 commit intomainfrom
fix/excessive-logging-create-pr

mohammedahmed18 commented Apr 5, 2026

Uh oh!

claude bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohammedahmed18 commented Apr 5, 2026

Problem

Evidence

Root Cause

Fix

Testing

Impact

Related Issues

Uh oh!

claude bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

1. The [PR-DEBUG] statements are temporary debug code — they should all be removed

2. Test file is overbuilt for a 1-line change

Duplicate Detection

Test Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 5, 2026 •

edited

Loading

1. The `[PR-DEBUG]` statements are temporary debug code — they should all be removed